577 research outputs found

    Optimally splitting cases for training and testing high dimensional classifiers

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>We consider the problem of designing a study to develop a predictive classifier from high dimensional data. A common study design is to split the sample into a training set and an independent test set, where the former is used to develop the classifier and the latter to evaluate its performance. In this paper we address the question of what proportion of the samples should be devoted to the training set. How does this proportion impact the mean squared error (MSE) of the prediction accuracy estimate?</p> <p>Results</p> <p>We develop a non-parametric algorithm for determining an optimal splitting proportion that can be applied with a specific dataset and classifier algorithm. We also perform a broad simulation study for the purpose of better understanding the factors that determine the best split proportions and to evaluate commonly used splitting strategies (1/2 training or 2/3 training) under a wide variety of conditions. These methods are based on a decomposition of the MSE into three intuitive component parts.</p> <p>Conclusions</p> <p>By applying these approaches to a number of synthetic and real microarray datasets we show that for linear classifiers the optimal proportion depends on the overall number of samples available and the degree of differential expression between the classes. The optimal proportion was found to depend on the full dataset size (n) and classification accuracy - with higher accuracy and smaller <it>n </it>resulting in more assigned to the training set. The commonly used strategy of allocating 2/3rd of cases for training was close to optimal for reasonable sized datasets (<it>n </it>≥ 100) with strong signals (i.e. 85% or greater full dataset accuracy). In general, we recommend use of our nonparametric resampling approach for determing the optimal split. This approach can be applied to any dataset, using any predictor development method, to determine the best split.</p

    Defining adequate contact for transmission of Mycobacterium tuberculosis in an African urban environment

    Get PDF
    Background The risk of infection from respiratory pathogens increases according to the contact rate between the infectious case and susceptible contact, but the definition of adequate contact for transmission is not standard. In this study we aimed to identify factors that can explain the level of contact between tuberculosis cases and their social networks in an African urban environment. Methods This was a cross-sectional study conducted in Kampala, Uganda from 2013 to 2017. We carried out an exploratory factor analysis (EFA) in social network data from tuberculosis cases and their contacts. We evaluated the factorability of the data to EFA using the Kaiser-Meyer-Olkin Measure of Sampling Adequacy (KMO). We used principal axis factoring with oblique rotation to extract and rotate the factors, then we calculated factor scores for each using the weighted sum scores method. We assessed construct validity of the factors by associating the factors with other variables related to social mixing. Results Tuberculosis cases (N = 120) listed their encounters with 1154 members of their social networks. Two factors were identified, the first named “Setting” captured 61% of the variance whereas the second, named ‘Relationship’ captured 21%. Median scores for the setting and relationship factors were 10.2 (IQR 7.0, 13.6) and 7.7 (IQR 6.4, 10.1) respectively. Setting and Relationship scores varied according to the age, gender, and nature of the relationship among tuberculosis cases and their contacts. Family members had a higher median setting score (13.8, IQR 11.6, 15.7) than non-family members (7.2, IQR 6.2, 9.4). The median relationship score in family members (9.9, IQR 7.6, 11.5) was also higher than in non-family members (6.9, IQR 5.6, 8.1). For both factors, household contacts had higher scores than extra-household contacts (p < .0001). Contacts of male cases had a lower setting score as opposed to contacts of female cases. In contrast, contacts of male and female cases had similar relationship scores. Conclusions In this large cross-sectional study from an urban African setting, we identified two factors that can assess adequate contact between tuberculosis cases and their social network members. These findings also confirm the complexity and heterogeneity of social mixing

    Statistical methodology for the analysis of dye-switch microarray experiments

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>In individually dye-balanced microarray designs, each biological sample is hybridized on two different slides, once with <it>Cy3 </it>and once with <it>Cy5</it>. While this strategy ensures an automatic correction of the gene-specific labelling bias, it also induces dependencies between log-ratio measurements that must be taken into account in the statistical analysis.</p> <p>Results</p> <p>We present two original statistical procedures for the statistical analysis of individually balanced designs. These procedures are compared with the usual ML and REML mixed model procedures proposed in most statistical toolboxes, on both simulated and real data.</p> <p>Conclusion</p> <p>The UP procedure we propose as an alternative to usual mixed model procedures is more efficient and significantly faster to compute. This result provides some useful guidelines for the analysis of complex designs.</p

    Validity of an isometric mid-thigh pull dynamometer in male youth athletes

    Get PDF
    The purpose of the present study was to investigate the validity of an isometric mid-thigh pull dynamometer against a criterion measure (i.e., 1,000 Hz force platform) for assessing muscle strength in male youth athletes. Twenty-two male adolescent (age 15.3 ± 0.5 years) rugby league players performed four isometric mid-thigh pull efforts (i.e., two on the dynamometer and two on the force platform) separated by 5 minutes rest in a randomised and counterbalanced order. Mean bias, typical error of estimate (TEE) and Pearson correlation coefficient for peak force (PF) and peak force minus body weight (PFBW) from the force platform were validated against peak force from the dynamometer (DynoPF). When compared to PF and PFBW, mean bias (with 90% Confidence limits) for DynoPF was very large (-32.4 [-34.2 to -30.6] %) and moderate (-10.0 [-12.8 to -7.2] %), respectively. The TEE was moderate for both PF (8.1 [6.3 to 11.2] %) and PFBW (8.9 [7.0 to 12.4]). Correlations between DynoPF and PF (r 0.90 [0.79 to 0.95]) and PFBW (r 0.90 [0.80 to 0.95] were nearly perfect. The isometric mid-thigh pull assessed using a dynamometer underestimated PF and PFBW obtained using a criterion force platform. However, strong correlations between the dynamometer and force platform suggest that a dynamometer provides an appropriate alternative to assess isometric mid-thigh pull strength when a force platform is not available. Therefore, practitioners can use an isometric mid-thigh pull dynamometer to assess strength in the field with youth athletes but should be aware that it underestimates peak force

    Human basal-like breast cancer is represented by one of the two mammary tumor subtypes in dogs.

    Get PDF
    BackgroundAbout 20% of breast cancers in humans are basal-like, a subtype that is often triple-negative and difficult to treat. An effective translational model for basal-like breast cancer is currently lacking and urgently needed. To determine whether spontaneous mammary tumors in pet dogs could meet this need, we subtyped canine mammary tumors and evaluated the dog-human molecular homology at the subtype level.MethodsWe subtyped 236 canine mammary tumors from 3 studies by applying various subtyping strategies on their RNA-seq data. We then performed PAM50 classification with canine tumors alone, as well as with canine tumors combined with human breast tumors. We identified feature genes for human BLBC and luminal A subtypes via machine learning and used these genes to repeat canine-alone and cross-species tumor classifications. We investigated differential gene expression, signature gene set enrichment, expression association, mutational landscape, and other features for dog-human subtype comparison.ResultsOur independent genome-wide subtyping consistently identified two molecularly distinct subtypes among the canine tumors. One subtype is mostly basal-like and clusters with human BLBC in cross-species PAM50 and feature gene classifications, while the other subtype does not cluster with any human breast cancer subtype. Furthermore, the canine basal-like subtype recaptures key molecular features (e.g., cell cycle gene upregulation, TP53 mutation) and gene expression patterns that characterize human BLBC. It is enriched in histological subtypes that match human breast cancer, unlike the other canine subtype. However, about 33% of canine basal-like tumors are estrogen receptor negative (ER-) and progesterone receptor positive (PR+), which is rare in human breast cancer. Further analysis reveals that these ER-PR+ canine tumors harbor additional basal-like features, including upregulation of genes of interferon-γ response and of the Wnt-pluripotency pathway. Interestingly, we observed an association of PGR expression with gene silencing in all canine tumors and with the expression of T cell exhaustion markers (e.g., PDCD1) in ER-PR+ canine tumors.ConclusionsWe identify a canine mammary tumor subtype that molecularly resembles human BLBC overall and thus could serve as a vital translational model of this devastating breast cancer subtype. Our study also sheds light on the dog-human difference in the mammary tumor histology and the hormonal cycle

    Characterization of Shewanella oneidensis MtrC: a cell-surface decaheme cytochrome involved in respiratory electron transport to extracellular electron acceptors

    Get PDF
    MtrC is a decaheme c-type cytochrome associated with the outer cell membrane of Fe(III)-respiring species of the Shewanella genus. It is proposed to play a role in anaerobic respiration by mediating electron transfer to extracellular mineral oxides that can serve as terminal electron acceptors. The present work presents the first spectropotentiometric and voltammetric characterization of MtrC, using protein purified from Shewanella oneidensis MR-1. Potentiometric titrations, monitored by UV–vis absorption and electron paramagnetic resonance (EPR) spectroscopy, reveal that the hemes within MtrC titrate over a broad potential range spanning between approximately +100 and approximately -500 mV (vs. the standard hydrogen electrode). Across this potential window the UV–vis absorption spectra are characteristic of low-spin c-type hemes and the EPR spectra reveal broad, complex features that suggest the presence of magnetically spin-coupled low-spin c-hemes. Non-catalytic protein film voltammetry of MtrC demonstrates reversible electrochemistry over a potential window similar to that disclosed spectroscopically. The voltammetry also allows definition of kinetic properties of MtrC in direct electron exchange with a solid electrode surface and during reduction of a model Fe(III) substrate. Taken together, the data provide quantitative information on the potential domain in which MtrC can operate

    Valtion aluehallintovirastot ja niiden ylijohtajat: Pohjoiseurooppalainen analogia Ranskan prefeikteille

    Get PDF
    This chapter examines the closest Finnish analogy to the French function of the prefect. In Finland, since 2010, this function has been vested in the institution of the State Regional Administrative Agency (SRAA, aluehallintovirasto, ‘AVI’). There are six SRAAs, each headed by a Chief Director (ylijohtaja) nominated by the government. The study had four main findings. First, despite ambiguity in institutional terminology, classifications, boundaries and identities concerning the SRAA, one can discern few true functional or structural deficiencies. Second, the SRAA is a hybrid between an institution of its own and a territorial representative of either government ministries or government agencies, to which is related the fact that each SRAA has both responsibilities concerning its territory and nationwide responsibilities. Third, tensions between performance and institutional legitimation prevail in the institution of the SRAA, but again without serious deficiencies. Fourth, the 2010 substitution of the SRAA for the former Province comprised a radical institutional change. The 2015–2019 Finnish government intended to abolish the SRAAs, but the subsequent government abandoned that reform, and ultimately by mid-2020 it became clear that the institution of the SRAA was here to stay after all.Peer reviewe
    corecore